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Abstract 

In this paper, we present a simple non-parametric method for learning the 
structure of undirected graphs from data that drawn from an underlying unknown 
distribution. We propose to use Brownian distance covariance to estimate the con- 
ditional independences between the random variables and encodes pairwise Markov 
graph. This framework can be applied in high-dimensional setting, where the num- 
ber of parameters much be larger than the sample size. 



1 Introduction 

Undirected graphical models, also known as Markov random fields or Markov networks, 
have become a part of the mainstream of statistical theory and application in recent 
years. These models use graphs to represent conditional independences among sets of 
random variables. In these graphs, the absence of an edge between two vertices means the 
corresponding random variables are conditionally independent, given the other variables. 
Learning the structure of a graph is equivalent to learning if there exists an edge between 
every pair of nodes in the graph. 

In the past decade, significant progress has been made on designing efficient algorithms 
to learn undirected graphs from high- dimensional observational datasets. Most of these 
methods are based on either the penalized maximum-likelihood estimation or penalized 
regression methods. Works has focused on the problem of estimating the graph in this 
high dimensional setting, which becomes feasible if graph is sparse. 
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Friedman (2007)] develop an efficient algorithm for computing the estimator with 



excellent theoretical properties using a graphical version of the lasso. 
In high dimensional problems normality assumption is the main constraint in methods. 
But we can replace the Gaussian constraint with a semi-parametric Gaussian copula, 
as discussed in Liu, et al (2009)| . This method use a semi-parametric Gaussian copula 



or non-paranormal approach by replacing linear functions with a set of one-dimensional 
smooth functions, for high dimensional inference. The non-paranormal extends the nor- 
mal by transforming the variables by smooth functions. 

The Gaussian distribution is almost always used in study for this scope, because the 
Gaussian distribution represents at most second-order relationships, it automatically en- 
codes a pairwise Markov graph. The Gaussian distribution has the property that if the 
ijth component of inverse covariance matrix is zero, then variables i and j are condition- 
ally independent, given the other variables. 

Distance correlation is a measure of dependence between random vectors introduced 
by Rizzo, and Bakirov (2007)] . For all distributions with finite first moments distance 



correlation, it is zero if and only if the random vectors are independent. We use this 
properties to construct our method. 

In this paper, we have discussed the structure learning of Markov graphs with large- 
dimensional covariance matrices where the number of variables is not small compared to 
the sample size. It is well-known that in such situations the usual estimator, the sample 
covariance matrix, may not be invertible. The approach suggested is to use distance 
covariance matrix towards the identity this matrix. 



2 The proposed method 

In this paper we are concerned with the task of estimating the graph structure of a 
Markov random field over a random vector X = (X±,X2, ...,X P ), given n independent 
and identically distributed samples. 

It was shown that the distance covariance is zero if and only if the two vectors were 
independent, we use this property. 

The main idea is to create a matrix from each pair of distance correlation. Then use it 
to construct an adjacency matrix of conditional independents between each node pair. 
The distance dependence statistics in |Rizzo, and Bakirov (2007)] are defined as follows. 
For a random sample (X,Y) = (Xk,Yk) : k— l,...,n of n i.i.d. random vectors (X, 
Y ) from the joint distribution of random vectors X in W and Y in 3ft 9 , compute the 
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Euclidean distance matrices (a k i) = (\X k — Xi\ p ) and (b k i) = — 

Define A k i = a k i - a k . - a.i + a.., k, I — 1, n, 

where 




(1) 



Similarly define 5^ = b k i — b k . — b.i + b.., k, I = 1, n, 
Then sample distance correlation dcor(X, Y) are defined by, 




(2) 



where 
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(3) 



is the distance covariance. 

Also, distance covariance has simple computing formula, the computations would appear 
to be 0(n 2 ), which can be burdensome for large n. 

This estimator is distribution-free and has a simple explicit formula that is easy to com- 
pute and interpret. 

In our study this formula become more easier, because here p = q = 1, for example in 
procedure to form distance correlation matrix, a k i,b k i for each vector only one time com- 
puted, furthermore J2 k l=1 is changed to 2 J2k<i-k i=i an d requires less computing time. 
In Appendix [A] we show a function coded in R, that calculate this. 
The distance correlation, is implemented in the R package energy [2]. 

We construct a matrix R of sample distance correlation (dcor) between each pair of 
nodes, so the element i,j in R is equal to dcor(Xi, Xj), 



R 



/ 1 dcor(X 1 ,X 2 ) ... cfcor(Xi,X p )\ 

dcor(X 2 , Xi) 1 ... dcor(X 2 , X p ) 



(4) 



; ; i : 

\dcor(X pi Xi) dcor(X p , X 2 ) ... 1 
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Figure 1: The logarithm of determinant of correlation matrices in different dimensions. 



We call the matrix R defined above the distance correlation matrix. 

From the property that dcor(Y, X) = dcor(X,Y), obviously R is a symmetric matrix. 
When the matrix dimension p is larger than the number n of observations available, the 
ordinary sample covariance matrix is not invertible. But distance covariance or distance 
correlation matrix, has better performance. 

We generate three different random data in 2 to 100 dimensions, and compute average of 
determinant of the correlation and distance correlation matrix in each dimension. As we 
see these results in figure [TJ distance correlation is more invertible. and computationally, 
is non-singular enough. 



Now we construct The partial correlation matrix. When rij\REST be the partial 
correlation between the variables Xi and Xj, given all the remaining variables, and 
P = R~ l , P = (px x • ), the inverse of the correlation matrix [Whittaker (1990)]; is given 
by, 
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Px l X j \REST — ~ ~~~~T- (5) 

[PiiPjjl 



2 



We can calculate matrix P, by simple computations. As in partial. cor function in R 
Package Rcmdr[IJ implemented, following code in R language programming do this. 

RI <-solve(R) #RI is the inverse of R 

D <- l/sqrt(diag(RI)) 

P <- -RI * (D / o /> D) #°/oO°/„ is the outer product operator 

diag(P) <- 



Finally P is the sparse structure matrix of the graph. 
We can compare each element of R to a tuning parameter (forming the paths) and derive 
desired adjacency matrix. 

3 Simulation Results 

In this simulation, we demonstrate the performance of the proposed approach on finding 
the sparse structures of random Markov networks, by generating Erdos-Renyi random 
graphs. 

The Erdos-Renyi random graph Gp is a graph on p nodes in which the probability of an 
edge being in the graph is | and the edges are generated independently. In this random 
graph, the average degree of a node is c. 

Given the precision matrix for a zero-mean Gaussian distribution, it is easy to sam- 
ple data from the distribution. But we do not know the distribution. So we randomly 
constructed precision matrices, and set random linear relationships with white noise be- 
tween the columns of data sample matrices. 

In a similar manner to Lin, et al (2009) |, We simulated Erdos-Renyi random graphs 



in two types of sparse structures (or precision matrices): 1) 50 nodes with averagely 3 
neighbours per node, 2) 200 nodes with averagely 4 neighbours per node. 
Now, the goal here is to see how well our approach recovers the sparse structures of those 
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Figure 2: The performance comparison between our proposed approach and the non- 
paranormal approach. 



precision matrices given different numbers of sampled data. 

Figure [2] illustrates the performance of our approach in recovering the structures of dif- 
ferent Markov networks in comparison with the non-paranormal approach as discussed 
Liu, et al (2009)] . The performance is evaluated by Hamming distance, the number 



in 



Liu, et al (2009)], using the function huge.npnQ implemented in the R pack- 



Zhao, et al (2012)] for estimating a semi-parametric Gaussian copula model 



of disagreeing edges between an estimated network and the ground truth, in an equal 
number of edges. 

4 Experimental Results 

In this section, we are compared our algorithm to that of non-paranormal method as dis- 
cussed in 
age huge 

by truncated normal or normal score. 

The example is based on a stock market data which is contributed to the huge pack- 
age that shows closing prices from all stocks in the SSzP 500 for all days that the market 
was open between January 1, 2003 and January 1, 2008. This gave us 1258 samples for 
the 452 stocks that remained in the SSzP 500 during the entire time period. Here for 
convenience and more visibility, we select only first 20 parameters. Also and instead of 
force-based graph drawing layout of Fruchterman-Reingold that utilized in plot function 
in huge, we use R package ggm[3] for visualize graphs more distinctly. 
The output of huge package graph estimation using the transformed data method and 
also preprocessing step that mentioned in Zhao, et al (2012")] is shown in Figure [3] 
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Figure 3: The output of huge, when nlambda=40, lambda.min. ratio =.05 
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Figure 4: The estimated graph paths using non-paranormal method. 
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tp= 0.935 



tp= 0.892 



tp= 0.760 




Figure 5: The estimated graph paths using our method. Tuning parameter tp, is chosen 
so that the number of edges is close to non-paranormal method that plots figure |4j 

Data have been transformed by calculating the log-ratio of the price at time t to price at 
time t — 1, and then standardized by subtracting the mean and adjusting the variance 
to one. 

In order to more distinctly, we sending above output estimated data to R package ggm 
[3J, and plot the graph again, in two different layout, circle and Fruchterman-Reingold. 
Figure [4] shows the results. 
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A Appendix 



Simplified distance correlation 

The following R code, calculate simplified distance correlation between two vectors x,y. 

dcor2<-f unction (x , y) { 
n <- length (x) ; 

if (n != length(y)) {stop ("Sample sizes must be equal")} 

u <- matrix(0 , 2 ,n+l) ; 

w<-0 

for(i in l:n){ 
for(j in l:n){ 
if (i<j){ 

w<-abs(x[i]-x[j] ) ; u [1 , i] <-u [1 , i] +w; u [1 , j] <-u[l , j] +w; 
} 

else if(i>j){ 

w<-abs(y[i]-y[j] ) ; u [2 , i] <-u [2 , i] +w; u[2, j]<-u[2, j]+w; 
} 

> 
> 

u <-u/n 

u[l,n+l]<-mean(u[l,] [l:n]) 
u[2,n+l]<-mean(u[2,] [l:n]) 

r<-0; rx<-0; ry<-0; 
for(i in l:n){ 
for(j in l:n){ 

r<- r + (abs(x[i]-x[j])-u[l,i]-u[l, j] +u[l ,n+l] ) * (abs(y [i] -y [j] ) -u [2 , i] -u [2 , j]+u[2,n+l]) 
rx<- rx+(abs(x[i]-x[j] )-u[l,i]-u[l, j]+u[l,n+l] )*(abs(x[i]-x[j] ) -u [2 , i] -u [2 , j]+u[2,n+l] ) 
ry<- ry+(abs(y[i]-y[j])-u[l,i]-u[l,j]+u[l,n+l])*(abs(y[i]-y[j])-u[2,i]-u[2, j]+u[2,n+l]) 
} 

> 

rx<-sqrt (rx) /n 
ry<-sqrt (ry) /n 
r<-sqrt (r) /n 
r/ (sqrt (rx*ry) ) 

} 
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